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ABSTRACT 

We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to 
photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS DR7). Each algorithm 
is defined by a set of parameters which, when varied, produce different final classification trees. We 
extensively explore the parameter space of each algorithm, using the set of 884, 126 SDSS objects with 
spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the 
completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured 
by the mean completeness in two magnitude intervals: 14 < r < 21 (85.2%) and r > 19 (82.1%). We 
compare the performance of the tree generated with the optimal FT configuration to the classifications 
provided by the SDSS parametric classifier, 2DPHOT and Ball et al. (2006). We find that our FT classifier 
is comparable or better in completeness over the full magnitude range 15 < r < 21, with much lower 
contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the 
only one able to maintain high completeness (>80%) while still achieving low contamination (~ 2.5%). 
Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS 
photometric objects in the magnitude range 14 < r < 21. 

Subject headings: Methods: data analysis - Catalogues - Surveys - Virtual observatory tools 

duce terabytes of astrophysical data in a year. More- 
over, several large scale surveys are being planned for 
the next ten years, generating a vast quantity of deep 
and wide photometric images. These surveys will pro- 
vide data at rates and volumes much greater than any 
previous projects. Therefore it is necessary to not only 
develop new methods for processing and analyzing 
such huge data volumes, but also to ensure that the 
techniques applied to extract information from the data 
are optimal. 



1. Introduction 

Astronomical data acquisition has experienced a 
revolution both in quality and complexity during the 
last three decades. The main driver has been the de- 
ployment of, and enormous growth in, modern digital 
CCD detectors that replaced venerable photographic 
plates in the 1980s. Digital images provided by CCDs, 
coupled with rapid developments in computation and 
data storage, made it possible and even routine to pro- 
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A basic step in the extraction of sensible astro- 
nomical data from photometric images is separating 
intrinsically pointlike sources (stars) from extended 
ones (galaxies). Distinguishing between these two 
classes becomes increasingly difficult as sources be- 
come fainter due to the lack of spatial resolution and 
signal-to-noise. Our goal in this work is to test a va- 
riety of decision tree classifiers, and ultimately per- 
form reliable star/galaxy separation for objects from 
the Seventh Data Release of the Sloan Digital Sky 
Survey (SDSS-DR7; Abazajian et al. 2009) based on 
photometric data. We use the SDSS because it also 
contains an enormous number of objects with spectro- 
scopic data (which give the true object classes), and 
because of the quality, consistency and accuracy of its 
photometric data. 

In the 1970s and 1980s, when digitized images be- 
came widespread in astronomy, many authors under- 
took projects to create automated methods to separate 
stars from galaxies. The first efforts relied on purely 
parametric methods, such as the pioneering works of 
Macgillivray et al. (1976), Heydon-Dumbleton et al. 
(1989) and Maddox et al. (1990). Macgillivray et 
al. (1976) used a plot of transmission vs. log (areaQ 
fitting a discriminant function to separate stars and 
galaxies. Their star/galaxy separation had a complete- 
ness (i.e, the fraction of all galaxies classified as such) 
of 95% and a contamination (fraction of non-galaxy 
objects classified as galaxies) of 5-10%. Heydon- 
Dumbleton et al. (1989) performed star/galaxy sepa- 
ration on 200 photographic plates digitized by COS- 
MOS. Rather than use directly measured object at- 
tributes, they generated classification parameters based 
on the data, plotting these as a function of magni- 
tude in bi-dimensional parametric diagrams. In these 
classification spaces they then used an automated pro- 
cedure to derive separation functions. They reached 
98 ± 2% completeness with 8 ± 2% contamination. 
Maddox et al. (1990) used a set of 10 parameters mea- 
sured by APM in 600 digitized photographic plates 
from the UK Schmidt Telescope. They reached 90% 
completeness with 10% contamination at magnitudes 
Bj < 20.5. All of these completeness and contamina- 
tion must be treated with caution, as they are based on 
comparison to expected number counts and plate over- 
laps, rather than a spectroscopic "truth" sample. 

As the volume of digital data expanded, along with 



Decimal logarithm of occupied area of the object in the image. The 
area is measured as the number of squares with side equal to 8/jm 



the available computing power, many authors began 
to apply machine learning methods like decision trees 
(DT - see below, Section^ and neural networks to ad- 
dress star/galaxy separation. Unlike parametric meth- 
ods, machine learning methods do not suffer from 
the subjective choice of discriminant functions and 
are more efficient at separating stars from galaxies at 
fainter magnitudes (Weir et al. 1995). These meth- 
ods can incorporate a large number of photometric 
measurements, allowing the creation of a classifier 
more accurate than those based on parametric meth- 
ods. Weir et al. (1995) applied two different DT al- 
gorithms, the GID*3 (Fayyad, 1994) and the O-Btree 
(Fayyad & Irani, 1992) as star/galaxy separators for 
images from the Digitized Second Palomar Observa- 
tory Sky Survey (DPOSS), obtaining 90% complete- 
ness and 10% contamination. Odewahn et al. (1999) 
applied a neural network to DPOSS images and estab- 
lished a catalog spanning 1000 square degrees. Ode- 
wahn et al. (2004) used a DT and a neural network to 
separate objects in DPOSS and found that both meth- 
ods have the same accuracy, but the DT consumes less 
time in the learning process. Suchkov et al. (2005) 
was the first to apply a DT to separate objects from 
the Sloan Sky Digital Survey (SDSS). The authors ap- 
plied the oblique decision tree classifier ClassX, based 
on OC1, to the SDSS-DR2 (Abazajian et al. 2004). 
They classified objects into stars, red stars (type M or 
later), AGN and galaxies, giving a percentage table of 
correct classifications that allows one to estimate their 
completeness and contamination. Ball et al. (2006) 
applied an axis-parallel decision tree. These authors 
used 477,068 objects from SDSS-DR3 (Abazajian et 
al. 2005) to build the decision tree - the largest training 
set ever used. They obtained a completeness of 93.8% 
for galaxies and 95.4% for stars. 

In this paper we employ a DT machine learning 
algorithm to separate objects from SDSS-DR7 into 
stars and galaxies. We evaluate 13 different DT algo- 
rithms provided by the WEKA (Waikato Environment 
for Knowledge Analysis) data mining tool. We use a 
training data set containing only objects with measured 
spectra. The algorithm with the best performance on 
the training data was then used to separate objects in 
the much larger data set of objects having only pho- 
tometric data. This is the first work published testing 
such a large variety of algorithms and using all of the 
data in the final SDSS data release (see also Ruiz et al. 
2009). 

Improving star/galaxy separation at the faintest 
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depths of imaging surveys is not merely an academic 
exercise. By significantly improving the completeness 
in faint galaxy samples, and reducing the contamina- 
tion by misclassified stars, many astrophysically im- 
portant questions can be better addressed. Mapping 
the signature of baryon acoustic oscillations requires 
large galaxy samples - and the more complete at higher 
redshift, the better. The measurement of galaxy-galaxy 
correlation functions is of course improved, both by in- 
creasing the number of galaxies used, and by reducing 
the washing out of the signal due to the smooth dis- 
tribution of erroneously classified stars. Weak lensing 
surveys, which need the largest and purest sample of 
background (lensed) galaxies and excellent photomet- 
ric redshifts benefit on both fronts. Similarly, searches 
for galaxy clusters using galaxy overdensities increase 
their efficiency when there are fewer contaminant stars 
and more constituent galaxies. Searches for rare ob- 
jects, both stellar and extended, also win with reduced 
contamination, as do any programs which target ob- 
jects for follow-up spectroscopy based on the source 
type. 

For future imaging surveys, optimized classifiers 
will require a new breed of training set. Because they 
cover large sky areas, programs like the Dark Energy 
Survey (DES), the Large Synoptic Survey Telescope 
(LSST) and Pan-STaRRS can utilize all available spec- 
troscopy to create training samples, even to quite faint 
magnitudes. Because most spectroscopy has targeted 
galaxies, the inclusion of definite stars must be ac- 
complished in another way. Hubble Space Telescope 
(HST) images have superb resolution and can be used 
to determine the morphological class (star or galaxy) 
of almost all objects observed by HST. Although cov- 
ering only a tiny fraction of the sky, the depth of even 
single orbit HST images and the area overlap with 
these large surveys will provide star/galaxy (and per- 
haps even galaxy morphology) training sets that are 
more than sufficient to implement within an algorithm 
like the one we describe. 

The structure of this paper is as follows. In §[2] we 
describe the SDSS data used to evaluate the WEKA 
algorithms. In §[3] we give a brief description of the 
DT method and discuss the technique used to choose 
the best WEKA decision tree building algorithm. In 
§|4]we discuss the evaluation process and the results 
for each algorithm tested. In §[5] we compare our best 
star/galaxy separation method to the SDSS paramet- 
ric method (York et al. 2000), the 2DPHOT para- 
metric method (La Barbera et al. 2008) and the axis- 



parallel DT used by Ball et al. (2006). We also ex- 
amine whether the SDSS parametric classifier can be 
improved by modifying the dividing line between stars 
and galaxies in the classifier's parameter space. We 
summarize our results in ^6] 

2. The Data 

We used simple Structured Query Language (SQL) 
queries to select data from the SDSS Legacy sur- 
ve >LJ Objects were selected having r-magnitudes in 
the range 14" ! - 21 m . We obtained two different data 
samples: the spectroscopic, or training, sample and 
the application sample. The spectroscopic sample (see 



§ 4. 1 1 contains only those objects with both photo- 
metric and spectroscopic measurements, while objects 
in the application sample have only photometric mea- 
surements. The spectroscopic sample was obtained 
through the following query: 
SELECT 

p.objID, p.ra, p. dec, s.specObjID, 
p.psfMag.r, p.modelMag_r, p.petroMag_r , 
p. fiberMag_r , p . petroRadjr , p.petroR5®_] 
p . petroR9Q_r , p . InLStar.r , p . lnLExp_r , 
p.lnLDeV.r, p.mEl_r, p.mE2_r, p.mRrCc_r, 
p.type_r,p.type, s.specClass 
FROM PhotoObj AS p 

JOIN SpecObj AS s ON s.bestobjid = 

p.objid 
WHERE 

p.modelMag_r BETWEEN 14.® AND 21.® 

This query returned slightly over one million objects 
assigned to six different classes according to their 
SDSS spectral clas^J However, only objects of spec- 
tral class star and galaxy are used, leaving us with 
884,378 objects for the spectroscopic sample. The ma- 
jority of excluded objects are spectroscopically QSOs 
(9.1% of the query results), many of which have one 
or more saturated pixels in the photometry. We also 
removed 51 stars and 147 galaxies with non-physical 
values (e.g. -9999) for some of their photometric at- 
tributes. Finally we excluded 54 objects found to 
be repeated SDSS spectroscopic targets, leaving a fi- 



2 These queries were written for the SDSS Sky Server database 
which is the DR7 Catalog Archive Server (CAS); see 
http://cas.sdss.org/astrodr7/en/ Photometric data was obtained 
through the photoOb j view of the database and spectroscopic data 
through the specObj view. 



3 For more information about spectral class please refer to § 4.1 
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nal training sample of 884, 126 objects, consisting of 
84,043 stars and 800,083 galaxies. These all have re- 
liable SDSS star or galaxy spectral classifications and 
meaningful photometric attributes. 

The application sample was built similarly to the 
spectroscopic sample with the following query: 
SELECT 

objID, ra, dec, psfMag.r, modelMag_r, 
petroMag_r, fiberMag.r, petroRad_r, 
petroR50_r, petroR9SLr , lnLStar_r, 
lnLExpjr, InLDeVjr, mEl_r, mE2jr, 
mRrCc.r , type_r , type 

FROM PhotoObj 

WHERE 

modelMag_r BETWEEN 14.0 AND 21. Q 

This retrieved photometric data for nearly 70 million 
objects from the Legacy survey. We use the Legacy 
survey rather than SEGUE because we are interested in 
classifying distant objects at the faint magnitude limit 
of the SDSS catalog. 

3. The Decision Tree Method 

Machine learning methods are algorithms that al- 
low a computer to distinguish between classes of ob- 
jects in massive data sets by first "learning" from a 
fraction of the data set for which the classes are known 
and well defined - the training set. Machine learning 
methods are essential to search for potentially useful 
information in large, high-dimensional data sets. 

A DT is a well-defined machine learning method 
consisting of nodes which are simple tests on individ- 
ual or combined data attributes. Each possible out- 
come of a test corresponds to an outgoing branch of 
the node, which leads to another node representing an- 
other test and so on. The process continues until a final 
node, called a leaf, is reached. Figure[T]shows a graph- 
ical representation of a simple DT constructed with 
50,000 randomly chosen SDSS objects having spec- 
troscopic data. At its topmost node (the root node) 
the tree may branch left or right depending on whether 
the value of the data attribute petroR90 is less than or 
greater than 2.359318. Either of these branches may 
lead to a child node which may test the same attribute, 
a different one, or a combination of attributes. The 
path from the root node to a leaf corresponds to a sin- 
gle classification rule. 

Building up a DT is a supervised learning process, 
i.e., the DT is built node by node based on a data set 



where the classes are already known. This dataset is 
formed from training examples, each consisting of a 
combination of attribute values that leads to a class. 
The process starts with all training examples in the root 
node of the tree. An attribute is chosen for testing in 
the node. Then, for each possible result of the test a 
branch is created and the dataset is split into subsets of 
training examples that have the attribute values spec- 
ified by the branch. A child node is created for each 
branch and the process is repeated, splitting each sub- 
set into new subsets. Child nodes are created recur- 
sively until all training examples have the same class 
or all the training examples at the node have the same 
values for all the attributes. Each leaf node gives either 
a classification, a set of classifications, or a probabil- 
ity distribution over all possible classifications. The 
main difference between the different algorithms for 
constructing a DT is the methood(s) employed to se- 
lect which attribute or combination of attributes to be 
tested in a node. 

3.1. WEKA and Tree Construction 

wekaQ is a Java Software package for data min- 
ing tasks developed by the University of Waikato, New 
Zealand. It consists of a collection of machine learning 
algorithms that can either be applied directly or called 
from an another Java code. WEKA contains tools for 
data pre-processing, classification, regression, cluster- 
ing, association rules, and visualization. 

In this work we use the WEKA DT tools, which in- 
clude 13 different and independent algorithms for con- 
structing decision tree^] In the following we give a 
brief description of each algorithm. 

J48 is the WEKA implementation of the C4.5 algo- 
rithm (Quinlam, 1993). Given a data set, it 
generates a DT by recursive partitioning of the 
data. The tree is grown using a depth-first strat- 
egy, i.e., the algorithm calculates the informa- 
tion gain for all possible tests that can split the 
data set and selects a test that gives the great- 
est value. This process is repeated for each new 
node until a leaf node has been reached. 

J48graft generates a grafted DT from a J48 tree. The 
grafting technique (Webb, 1999) adds nodes to 



http://www.cs.waikato.ac.nz/ml/weka/ 
5 There are other DT algorithms in WEKA which are not capable of 
working with numerical attributes. 
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an existing DT with the purpose of reducing pre- 
diction errors. This algorithm identifies regions 
of the multidimensional space of attributes not 
occupied by training examples, or occupied only 
by misclassified training examples, and consid- 
ers alternative branches for the leaf containing 
the region in question. In other words, a new test 
will be performed in the leaf, generating new 
branches that will lead to new classifications. 

BFTree (Best-First decision Tree; Haijian Shi, 2007) 
has a construction process similar to C4.5. The 
main difference is that C4.5 uses a fixed order 
to build up a node (normally left to right), while 
BFTree uses the best-first order. This method 
first builds the nodes that will lead to the longest 
possible paths (a path is the way from the root 
node to a leaf). 

FT (Functional Trees; Gama, 2004) combines a stan- 
dard univariate DT, such as C4.5, with linear 
functions of the attributes by means of linear re- 
gressions. While a univariate DT uses simple 
value tests on single attributes in a node, FT can 
use linear combinations of different attributes in 
a node or in a leaf. 

LMT (Logistic Model Trees, Landwher et al. 2006) 
builds trees with linear functions in leafs as does 
the FT algorithm. The main difference is that 
instead of using linear regression, LMT uses lo- 
gistic regression. 

Simple Cart is the WEKA implementation of the 
CART algorithm (Breiman et al. 1984). It is 
similar to C4.5 in the process of tree construc- 
tion, but while C4.5 uses information gain to 
select the best test to be performed on a node, 
CART uses the Gini index. 

REPTree is a fast decision tree learner that builds 
a decision/regression tree using information 
gain/variance as the criterion to select the at- 
tribute to be tested in a node. 

Random tree models have been extensively devel- 
oped in recent years. The WEKA Random Tree 
algorithm builds a tree considering K randomly 
chosen attributes at each node. 

Random Forest (Breiman, 2001) generates an en- 
semble of trees, each built from random sam- 
ples of the training set. The final classification 
is obtained by majority vote. 



NBTree (Naive Bayesian Tree learner algorithm; Ko- 
havi, 1996) generates a hybrid of a Naive- 
Bayesian classifier and a DT classifier. The 
algorithm builds up a tree in which the nodes 
contain univariate tests, as in a regular DT, but 
the leaves contains Naive-Bayesian classifiers. 
In the final tree an instance is classified using 
a local Naive Bayes on the leaf in which it 
fell. NBTree frequently achieves higher accu- 
racy than either a Naive Bayesian classifier or a 
DT learner. 

ADTree (Alternating Decision Tree; Freund and Ma- 
son, 1999) is a boosted DT. An ADTree con- 
sists of prediction nodes and splitter nodes. The 
splitter nodes are defined by an algorithm test, 
as, for instance, in C4.5, whereas a prediction 
node is defined by a single value x e R 2 . In 
a standard tree like C4.5 a set of attributes will 
follow a path from the root to a leaf according 
to the attribute values of the set, with the leaf 
representing the classification of the set. In an 
ADTree the process is similar but there are no 
leaves. The classification is obtained by the sign 
of the sum of all prediction nodes existing in the 
path. Different from standard trees, a path in an 
ADTree begins at a prediction node and ends in 
a prediction node. 

LADTree (Holmes et al. 2001) produces an ADTree 
capable of dealing with data sets containing 
more than 2 classes. The original formulation 
of the ADTree restricted it to binary classifica- 
tion problems; the LADTree algorithm extends 
the ADTree algorithm to the multi-class case 
by splitting the problem into several two-class 
problems. 

Decision Stump is a simple binary DT classifier con- 
sisting of a single node (based on one attribute) 
and two leaves. All attributes used by the other 
trees are tested and the one giving the best clas- 
sifications (PetroR50 in our case) is chosen to 
use in the single node. 

3.2. Accuracy and Performance: the Cross- Validation 
Method 

The accuracy of any method for star/galaxy separa- 
tion depends on the apparent magnitude of the objects 
and is often measured using the Completeness func- 
tion CP(m) (fraction of all galaxies classified as such) 
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and the Contamination function CT(m) (fraction of all 
stars classified as galaxies) in a magnitude interval 5m. 
These are defined as: 



CP(m) 

and 

CT(m) 



= 100* 



= 100* 



N ga i- ga i(m)6m 

N st qr-gal(m)8m 

N<°'Jm)6m 



(1) 



(2) 



where N ga i^ ga i(m)6m is the number of galaxy images 
correctly identified as galaxies within the magnitude 
interval (m - dm/2,m + 6m/2); N s i ar - ga i(m)6m is the 
number of star images falsely identified as galax- 
ies; N'"^ (m)5m is the total number of galaxies and 
N' s °' ar (m)Sm the total number of stars. 

It is also useful to define the mean values of these 
functions within a given magnitude interval. Thus 
for the mean completeness we have: (Compl)A m = 
(1/Ara) 2 CP(mj)6nij, with Am = £ dm/. A similar 
definition holds for the mean contamination. Note that 
(Compl)Am ■ Am gives the area under the complete- 
ness function in the interval Am. Unless otherwise 
stated, we calculate the completeness and contamina- 
tion functions using a constant bin width 5m = 0.5'". 

Our first goal is to find the best performing DT al- 
gorithm among those described in Section 3.1 in terms 
of accuracy, especially at faint magnitudes. However, 
for large data sets, the processing time is also a concern 
in this evaluation. 

There are various approaches to determining the 
performance of a DT. The most common approach is to 
split the training set into two subsets, usually in a 4: 1 
ratio, and construct a tree with the larger subset and 
apply it to the smaller. A more sophisticated method is 
called Cross-Validation (CV; Witten & Franck, 2000). 
The CV method, which is used here, consists of split- 
ting the training set into 20 subsamples, each with the 
same distribution of classes as the full training set. 
While the number of subsamples, 20, is arbitrary, each 
subsample must provide a large training set for the CV 
method. For each subsample a DT is built and ap- 
plied to the other 19 subsamples. The resulting com- 
pleteness and contamination functions are then col- 
lected and the median and dispersion over all subsets 
is found. This gives the cross-validation estimate of 
the robustness in terms of a completeness function. 



4. Star/Galaxy Separation for the SDSS 

The spectroscopic information provided by SDSS 
provides the true classification - star or galaxy - of 
an object. Despite the size of the SDSS spectroscopic 
sample (~ 1 million objects), it represents only a tiny 
fraction of all objects in the SDSS DR7 photometry 
(230 million). How can we classify the SDSS objects 
for which there is no spectroscopic data? The SDSS 
pipeline already produces a classification using a para- 
metric method based on the difference between the 



magnitudes psfMag and modelMag (see § 4.1 1. How- 
ever it is known (see Figure 6 below) that this classi- 
fication is not very accurate at magnitudes fainter than 
19.0. 

We will take advantage of the large spectroscopic 
sample from SDSS, for which we know the correct 
classes for all objects, to train a DT to classify SDSS 
objects based only on their photometric attributes. We 
expect that by using such a vast training set, the result- 
ing DT will be capable of maintaining good accuracy 
even at faint magnitude. 

4.1. Attributes 

We selected 13 SDSS photometric attributes and a 
single spectroscopic attribute (specClass), as shown 
in Table Q] 

This set of photometric attributes is the same for 
both the spectroscopic (training) and the application 
samples. While one could ask what set of attributes 
produces the most accurate star/galaxy separation, the 
enormous variety of attributes measured by SDSS for 
each photometric object places examination of that 
question beyond the scope of this work. We instead 
focus on those attributes that are known or expected to 



Table 1: SDSS-DR7 attributes used for star/galaxy sep- 
aration. 



Attribute 


CAS Variable 


PSF Magnitude 


psfMag 


Fiber Magnitude 


fiberMag 


Petrosian Magnitude 


petroMag 


Model Magnitude 


modelMag 


Petrosian Radius 


petroRad 


Radius carrying 50% of petrosian flux 


petroR5Q 


Radius carrying 90% of petrosian flux 


petroR9Q 


Likelihood PSF 


InLStar 


Likelihood Exponential 


InLExp 


Likelihood deVaucouleurs 


InLDeV 


Adaptive Moments 


mRrCc, mEl e mE2 


Spectroscopic classification 


specClass 
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strongly correlate with the object classification. These 
attributes are: 

• The PSF magnitude (psfMag), described in de- 
tail in Stoughton et al. (2002), obtained by fit- 
ting a Point Spread Function (PSF) Gaussian 
model to the brightness distribution of the ob- 
ject. We expect the PSF magnitude to be a good 
flux measure for stars, but it tends to overesti- 
mate the flux of extended objects due to their 
irregular shapes. 

• The fiber magnitude (f iberMag) is the flux con- 
tained within the 3" diameter aperture of a spec- 
troscopic fiber. 

• The petrosian magnitude (petroMag) is a flux 
measure proposed by Petrosian (1976). He de- 
fined a function rj(r) representing the ratio be- 
tween the mean surface brightness within a spe- 
cific radius and the surface brightness at this ra- 
dius. For a given value of rj one can define a 
petrosian radius (petroRad); the flux measured 
within this radius is the petrosian magnitude. 
Note that the SDSS pipeline adopts a modified 
form of the Petrosian (1976) system, as detailed 
in Yasuda et al. (2001). 

• The SDSS pipeline fits two different galaxy 
models to the two-dimensional image of an ob- 
ject, in each band: a de Vaucouleurs profile and 
an exponential profile. The model magnitude 
(modelMag) is taken from the better fitting of 
these two modelf] . 

• The attributes petroR50 and petroR90 are the 
radii containing 50% and 90% of the Petrosian 
flux for each band. These two attributes are not 
corrected for seeing and this may cause the sur- 
face brightness of objects of size comparable to 
the PSF to be underestimated. Nevertheless, the 
amplitude of these effects are not yet well char- 
acterized, and machine learning algorithms may 
still find relationships distinguishing stars from 
galaxies. 

• The model likelihoods InLStar, InLExp and 
InLDeV are the probabilities that an object 



6 For more details see http://www.sdss.org/dr7/algorithms photome- 
try, html 



would have at least the measured value of chi- 
squared if it were well represented by one of 
the SDSS surface brightness models: PSF, de 
Vaucouleurs or exponential, respectively. 

• The Adaptive moments mRrCc ,mEl and mE2 are 
second moments of the intensity, measured us- 
ing a radial weight function adapted to the shape 
and size of an object. A more detailed descrip- 
tion can be found in Bernstein & Jarvis (2002). 
Adaptive moments can be a good measure of the 
ellipticity. 

• The spectroscopic attribute specClass stores 
the object spectral classification, which is one of 
unknown , star , galaxy , qso , hiz_qso , sky , star_late 
or gal_em. Only objects classified as stars or 
galaxies are kept in the training set. All other 
spectroscopic classes constitute a small fraction 

of the objects, and are not uniquely tied to a 
specific morphological class. 

4.2. Objective Selection of the Optimal Tree Algo- 
rithm 

As discussed in Section |3.1| the WEKA data min- 
ing package provides 13 different algorithms to gener- 
ate a DT from a training set containing only numeri- 
cal attributes. Each algorithm employs different com- 
putational procedures using different sets of internal 
parameters [^resulting in construction of distinct final 
trees. For each algorithm, we test various sets of in- 
ternal parameters (always with the same input object 
attributes). We use the CV procedure (Section |3.2[ ) 
to compute the completeness function for each algo- 
rithm and all sets of its internal parameters, to find the 
best set of parameters maximizing the completeness. 
Then, we compare the perfomance among the various 
algorithms using the optimal internal parameters for 
each algorithm. In this section we discuss these tests 
and compare their results to find the optimal algorithm 
(and its best internal parameters) that will ultimately 
be used to provide star-galaxy separation for the entire 
DR7. 

We first exhaustively explored the internal param- 
eter space of each algorithm to determine which pa- 
rameters significantly change the resulting complete- 



7 We do not provide a full description of all the parameters involved 
in each algorithm; for more detailed descriptions please refer to 
http://www.cs.waikato.ac.nz/ml/weka/ 
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Fig. 2. — Results of the parameter space exploration for each of the 13 WEKA DT algorithms. The hatched areas 
show the loci of the completeness functions for galaxies obtained from the CV procedure. 
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ness functions, and discarded the irrelevant one^j . 
We tested the sensitivity of the completeness func- 
tion to the variation of each parameter taken individ- 
ually ("single tests") or in combination with others 
("combined tests"). For each set of completeness func- 
tions generated by the variation of a given parameter 
v, we computed the dispersion a m at the middlepoints 
m, = 14.25'" + i * 0.5'" and then averaged over all the 
intervals to obtain cr v = (l/N imerva i s )YjO'm i )- A pa- 
rameter was considered irrelevant whenever cr v < 5%. 
This procedure typically allowed us to discard one pa- 
rameter per algorithm. 

In the second step we searched for the optimal value 
of each remaining parameter. To do this we first com- 
puted the range for each parameter wherein <j v < 5%, 
as before. Then, within these limits we tested each al- 
gorithm to find its optimal parameter set. The results 
of these tests are shown in Figure 2, which displays the 
range of completeness functions computed using the 
CV procedure when varying the internal parameters of 
each algorithm. At bright magnitudes, r < 19, the al- 
gorithms behave very similarly, and their efficiency is 
stable under variation of their internal parameters. 

We then analyze the relative performance of the 
13 algorithms by comparing their completeness and 
contamination functions as well as their processing 
times when run with their optimal sets of internal 
parameters. We define the quantities (Compl)b r i e ht, 
(Compl)f a ; nt and Compl 20 75 which are the mean com- 



pletenesses (see Section 3.2 1 in the magnitude inter- 
vals 14 < r < 19 , 19 < r < 21 and 20.5 < r < 21, 
respectively. The results of this comparison are given 
in Tablej2 Column 1 gives the algorithm name as in 
Section [3T| Column 2 gives the total number of signif- 
icant internal parameters for that algorithm. Column 3 
gives the processing time for each algorithm (running 
on a 64 bits PC AMD Phenom X3 8650 triple-core 
processor - 2.3GHz), and Columns 4, 5 and 6 give the 
quantities (Compl)bright, (Compl)f a ; nt and Compl 2075 
along with their standard deviations. The rows in Ta- 
ble [2] have been ordered according to the values of 
(Compl) faint . 

It is clear from Table [2] (and Figure 2) that all of 
the algorithms have comparable efficiency at brighter 
magnitudes (r < 19). However, at r > 19 their perfor- 
mance varies significantly. The Decision Stump, with 
no internal parameters, unsurprisingly performs worst, 
although it is the fastest algorithm. The NBTree, 

8 Note that the Decision Stump and NBTree have no free parameters 



which also has no internal parameters, is considerably 
better albeit more computationally expensive. The fast 
algorithms, including Simple Cart, J48 and J48graft, 
REPTree and Random Tree, have mean faint-end com- 
pleteness of <Compl)f a i nt ~ 81% - 83% but with 
Compl 2()75 < 70%. The remaining algorithms provide 
better faint end completeness at the cost of increasing 
processing times. Note that all of the algorithms are 
almost equally robust, as measured from the disper- 
sions of their mean completeness. We conclude that 
the FT algorithm is optimal because of its greater ac- 
curacy at faint magnitudes while still requiring only 
modest processing time. In fact, as shown in Table [2] 
the FT algorithm is not only the most accurate among 
the 13 we tested, but also very robust (ranked second 
in dispersion of mean completeness). 

To examine what causes the different success rates 
among algorithms, we compare the best (FT) and 
worst (Random Tree) performing DTs. We examined 
the attribute values for objects where the classifica- 
tions from the two methods agreed and where they dis- 
agreed. We find that there are some attributes where 
the objects for which the two classifiers agree and for 
which they disagree are clearly separated. In the FT vs. 
Random Tree comparison, the separation is greatest 
in Petrosian attributes (especially PetroR90 and petro- 
Rad). The exact reasons for this are unclear, but be- 
cause each algorithm uses the same attributes, it must 
be an artifact of the tree construction. This further rein- 
forces the need to extensively test classifiers with large 
training samples to find the best algorithm. 

4.3. Constructing The Final Decision Tree 

Having found that the FT algorithm provides the 
best star/galaxy separation performance in cross- 
validation tests, we must choose a training set to con- 
struct the final DT to classify objects from the SDSS- 
DR7 photometric catalog. As described in Section [2] 
the CAS database provides 884,126 objects with star 
or galaxy spectral classification, which comprises our 
full training sample. However, using this entire data 
set within the WEKA implementation of the FT al- 
gorithm requires large amounts of computer memory, 
decreasing the overall performance. To see if the fi- 
nal tree depends strongly on the training set size and 
the exact values of each object's atrributes, we per- 
formed a test using subsets of the training data while 
perturbing the object attributes. For each photomet- 
ric attribute discussed in Section 4.1, we generate a 
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Table 2: Main results of the comparative study of WEKA algorithms. The columns are the name of the algorithm 
tested; the number of parameters of the algorithm that can change the resultant tree; the mean processing time av- 
eraged over all parameter sets tested; the mean completeness averaged over the magnitude interval [14, 19]; the 
mean completeness averaged over the magnitude interval [19,21]; and the completeness in the faintest magnitude 
bin (20.5 < r < 21.0) 



Algorithm 


Number of 


Processing Time 


<Compl> brig ht 


<Compl) faint 


Compl 2075 




Parameters 


(hours) 


% 


% 


% 


Decision Stump 





0.03 


99.20(+0.17) 


68.06(+1.20) 


42.29(±9.64) 


NBTree 





1.12 


99.64(+0.16) 


79.19(+1.39) 


63.55(+ 14.90) 


J48graft 


4 


0.09 


99.74(+0.12) 


80.93(+1.16) 


65.84(+ 10.39) 


Simple Cart 


3 


0.05 


99.63(+0.16) 


81.56(+1.13) 


67.06(+9.51) 


J48 


6 


0.08 


99.73(+0.12) 


81.70(±0.96) 


66.30(+7.69) 


REPTree 


4 


0.09 


99.50(+0.18) 


82.76(+1.09) 


69.32(±8.80) 


Random Tree 


2 


0.06 


99.50(+0.18) 


82.76(+1.09) 


69.32(±8.80) 


Random Forest 


3 


1.13 


99.77(+0.12) 


83.15(±1.14) 


70.48(+9.91) 


BFTree 


5 


0.24 


99.69(+0.15) 


83.18(+1.10) 


69.85(+9.55) 


ADTree 


2 


1.42 


99.73(+0.12) 


83.80(+1.12) 


71.88(±9.81) 


LMT 


2 


5.50 


99.66(+0.15) 


83.91(±1.14) 


72.18(±9.39) 


LADTree 


1 


7.90 


99.70(+0.14) 


84.39(+1.10) 


72.74(±9.41) 


FT 


3 


2.50 


99.64(±0.15) 


84.98(+1.08) 


74.04(±8.45) 





Fig. 1 . — A simple decision tree built with the J48 al- 
gorithm (Witten & Frank, 2000). This tree was trained 
with 50,000 objects from the spectroscopic sample, as 
described in §|2] and has a minimum number of objects 
per leaf equal to 50. 



perturbed value, 

X = X bs + cru , (3) 

where X b S is the observed attribute value and u is a 
random Gaussian deviate, with the dispersion cr com- 
puted from the first and fourth quartiles of the at- 
tribute's uncertainty distribution. We then subdivided 
the training data with these perturbed attributes into 
four samples, each with 221,029 objects, and built 4 
different DT's. We required that each subset have the 
same class distribution as the full training set. We then 
use these four DT's to classify the original training set. 
The results are shown in Figure [3] which shows that 
the completeness function is unchanged in the magni- 
tude range 14 < r < 20. At the faintest magnitudes, 
20 < r < 21 the completeness function varies slightly, 
by < 5%. These results suggest that we can safely re- 
duce the training set size by a factor of a few with no 
significant loss of accuracy. With this reduced training 
data set the DT construction is speeded up consider- 
ably. This test simultaneously confirms that the final 
DT is insensitive to measurement errors on the indi- 
vidual object attributes. 

We further tested the dependence of DT classifica- 
tion success on training set size. In future optical sur- 
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veys, we expect that spectroscopic data for the faintest 
objects will be at a premium, available for only a small 
fraction of the photometric objects. At r > 19, the 
fainter part of our training sample, we constructed DTs 
using different sized subsamples of the available train- 
ing data, ranging from 10% to 100% of the full train- 
ing set. We find that the completeness remains essen- 
tially unchanged as long as at least 20% of the faint- 
end training sample is used (about 7100 objects). This 
result suggests that future deep surveys may be able 
to perform accurate star/galaxy separation even with 
a modest truth sample for training, as long as there is 
sufficient resolution in the images. 

To select data to train the final DT, we consider 
bright and faint objects separately. At 14 < r < 19 
we selected 1/4 of the objects from the spectroscopic 
sample, maintaining the same magnitude distribution 
as in the full sample, yielding 205,348 objects. For 
magnitudes 19<r<21we kept all objects from the 
training set. These faint objects are poorly sampled 
spectroscopically, so we attempted to provide all pos- 
sible information to the FT algorithm to maintain the 
completeness function at faint magnitudes. 

The final training set contains 240,712 objects with 

13 attributes each, extracted from the original spectro- 
scopic sample. The resultant decision tree was then 
applied to classify objects with r magnitudes between 

14 and 21 from the entire SDSS-DR7 Legacy survey 
catalog. A portion of the resulting catalog is shown 
in Table [3j the full catalog is available as an elec- 
tronic table. Column 1 gives the unique SDSS ObjID 
and Column 2 contains the SDSS ModelMag magni- 
tude in r-band. Columns 3 and 4 give type,- and type, 
the SDSS classifications using r-band alone and using 
all five bands, respectively. The SDSS classifications 
are 0=unknown, 3=Galaxy, 6=Star, 8=Sky. Column 5 
provides our FT Decision Tree classification, where 1 
is a star and 2 is a galaxy. We note that we classified 
all objects with 14 <ModelMag r < 21, regardless of 
their SDSS classification. Detections which the SDSS 
has classified as type=0 or 8 should be viewed with 
caution as there is a high likelihood that they are not 
truly astrophysical objects. 

In the next section, we present a comparative study 
of our catalog and other catalogs providing star/galaxy 
classification for the SDSS. 




14 16 18 



r-mag 



Fig. 3. — The completeness (upper curves) and con- 
tamination (lower curves) functions for all four per- 
turbed data sets used to train a DT with the FT algo- 
rithm. Each data set is represented by a different line 
type. 
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Table 3: Star/galaxy classification provided by SDSS and by our FT Decision Tree. Column 1 lists the unique SDSS 
ObjID, Column 2 contains the SDSS modelMag magnitude in r-band. Columns 3 and 4 give type,- and type, the SDSS 
classifications using r-band alone and using all five bands, respectively. Column 5 provides our FT Decision Tree 
classification, where 1 is a star and 2 is a galaxy. 



SDSS ObjID 


ModelMag,. 


Type, 


Type 


FT Class 


588848900971299281 


20.230947 


3 


3 


2 


588848900971299284 


20.988880 


3 


3 


2 


588848900971299293 


20.560146 


3 


3 


2 


588848900971299297 


19.934738 


3 


3 


2 


588848900971299302 


20.039648 


3 


3 


2 


588848900971299310 


20.714321 


3 


3 


2 


588848900971299313 


20.742567 


3 


3 


2 


588848900971299314 


20.342773 


3 


3 


2 


588848900971299315 


20.425304 


3 


3 


2 


588848900971299331 


20.582634 


3 


3 


2 



5. Comparison With Other SDSS Classifications 

We compare the results of our DT classification 
of objects in the SDSS photometric catalog with 
those from the parametric classifier used in the SDSS 
pipeline, the 2DPHOT software for image processing 
(La Barbera et al, 2008) and the DT classification from 
Ball et al. (2006). These comparisons utilize only ob- 
jects with spectroscopic classifications so that the true 
classes are known. All three methods give classifica- 
tions other than just stars or galaxies. However, as we 
are interested only in stars and galaxies, all samples 
described in this section are exclusively composed of 
objects which were classified by the respective meth- 
ods as star or galaxy. 

5.1. FT Algorithm Versus 2DPHOT Method 

2DPHOT is a general purpose software package 
for automated source detection and analysis in deep 
wide-field images. It provides both integrated and 
surface photometry of galaxies in an image and per- 
forms star-galaxy separation by defining a stellar lo- 
cus in its parametric space (La Barbera et al. 2008). 
The comparison was done for 10, 391 objects from the 
spectroscopic sample which have been reprocessed by 
2DPHOT, with the results presented in Figure [4] We 
see that both classifiers have the same completeness 
trends with magnitude, but our FT classifier gener- 
ates almost no contamination, while contamination in 
2DPHOT reaches ~ 40%. 




14 16 18 

r-mag 



Fig. 4. — Completeness (upper curves) and contami- 
nation (lower curves) for the sample of 10,391 SDSS 
objects reprocessed with 2DPHOT and classified as ei- 
ther star or galaxy. The full lines show the complete- 
ness and contamination functions from our DT classifi- 
cation whereas the dash-dot lines are for the 2DPHOT 
classification. 
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5.2. FT Algorithm Versus Axis-Parallel DT 



Ball et al. (2006) were the first to apply the DT 
methodology to SDSS data. They use an axis-parallel 
DT to assign a probability that an object belongs to 
one of three classes: stars, galaxies or nsng (neither 
star nor galaxy). The completeness and contamination 
functions for both Ball's axis-parallel DT and for our 
FT, calculated using a sample of 561, 070 objects from 
Ball's catalog, are shown in Figure [5] These results 
show that our FT tree perform similarly to the axis- 
parallel tree, but the FT generates lower contamination 
than the axis-parallel tree. At the faint end (r > 19) 
our contamination remains constant around 3% while 
Ball's catalog has a contamination rate of ~ 9%. 

5.3. FT Algorithm Versus SDSS Pipeline Para- 
metric Method 

The SDSS pipeline classifies an object into 3 
classes: unknown, galaxy or star using the difference 
between the psf and model magnitudes (cf. Section 
|4.1| i. If the condition psf Mag - modelMag > 0.145 is 
satisfied, the object is classified as galaxy; otherwise, 
it is classified as a star. 

We first analyzed the behavior of our final DT in 
comparison with that of the SDSS pipeline for the 
full spectroscopic sample. We note that this will con- 
tain the same objects used to initially train the DT, as 
discussed in Section 4.3 However, because only ~ 



25% of the entire sample is used for training, we ne- 
glect its influence on the results. The completeness 
and contamination curves are shown in Figure [6] for a 
sub-sample of the entire spectroscopic sample where 
8,406 objects with at least one saturated pixel have 
been removed. The results show that the completeness 
of our DT classification stays above 80% for magni- 
tudes r > 19 with negligible stellar contamination. 
In contrast, the completeness for the SDSS pipeline 
drops to 60% in the same range, and is highly contam- 
inated at the brighter magnitudes. These results show 
that application of our DT provides a gain in com- 
pleteness of ~ 20% at faint magnitudes when com- 
pared with the SDSS pipeline. Our FT tree also yields 
much lower contamination than the SDSS parametric 
method; while the FT contamination stays around 6%, 
the SDSS parametric contamination is about 17% for 
magnitudes brighter than 18. 

Finally we compared our DT classifications and the 
SDSS parametric classifications for all objects in the 
application sample (see Section[2]i. Since there is no a 



100 




Fig. 5. — Completeness (upper curves) and contamina- 
tion (lower curves) for the 561 , 070 objects with SDSS 
spectroscopic data processed by Ball et al. (2006). 
The full lines give the completeness and contamination 
functions of our DT classification whereas the dash- 
dot lines give the same for Ball's classification. 
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Fig. 6. — The completeness (upper curves) and con- 
tamination (lower curves) functions for the SDSS para- 
metric method (dash-dotted lines) and for our DT 
method (solid lines), applied to the 880,715 objects of 
the spectroscopic sample (see text). 



priori true classification in this case (unlike the train- 
ing sample), we compare these 2 methods by assuming 
that the DT is correct, based on its better performance 
(described above). In Figure|7]we show the complete- 
ness and contamination functions for the SDSS para- 
metric classification assuming that our DT classifica- 
tion is correct. Only objects that are classified as either 
star or galaxy by both the SDSS parametric method 
and the FT DT are considered here, with other classi- 
fications in SDSS being an irrelevant fraction (0.05% 
of the total sample). We see from this figure that there 
is significant disagreement between our DT classifica- 
tion and the SDSS parametric method when classify- 
ing stars, implying a greater stellar contamination in 
the SDSS classification. The completeness shows, at 
faint magnitudes 20.5 < r < 21, that the two classifiers 
disagree on ~ 6% of the whole application sample. 

5.4. A Simple Test of the SDSS Pipeline Paramet- 
ric Method 

The SDSS parametric method relies on a single pa- 
rameter test: if the condition psfMag - modelMag > 
0.145 is satisfied, the object is classified as galaxy; oth- 
erwise, it is classified as a star. The cutoff between 
the two classes was chosen based on simulated im- 
ages of stars and galaxies. However, the choice of di- 
viding line between stars and galaxies in this parame- 
ter space can be improved using the extenssive spec- 
troscopic training sample. To test this, we retrieved 
psfMag -modelMag for the training sample, and used 
the Decision Stump to generate a single node Deci- 
sion Tree. In creating such a tree, the Decision Stump 
will seek the value of psfMag - modelMag which 
maximizes completeness while minimizing contam- 
ination. Surprisingly, we find that the optimal re- 
quirement for an object to be classified as a galaxy is 
psfMag - modelMag > 0.376, significantly different 
from the value used by SDSS. 

To see why there is such a large difference, we have 
examined images of bright objects misclassified by the 
SDSS parametric method, which are responsible for 
the high contamination rate shown in Figure 6. We find 
that many of these objects are in close pairs. Example 
of nine such objects are shown in Figure 8, where each 
panel includes an object misclassified by the SDSS 
parametric method. We clearly see that many are either 
heavily blended with a companion or close enough to 
another object for the value of psfMag - modelMag 
used by the SDSS parametric classifier to be influ- 
enced by the neighboring object. This is clearly visible 
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in Figure 9, which shows psfMag - modelMag as a 
function of magnitude for the training sample. Spec- 
troscopically classified stars are shown in red, while 
galaxies are shown in green. A second ridgeline of 
stars is clearly seen with psfMag - modelMag ~ 0.3, 
all of which are misclassified with the standard SDSS 
parametric cut. This issue was noted already in the 
SDSS Early Data Release (Stoughton et al. 2002), 
where they noted that un-deblended star pairs and 
galaxies with bright nuclei are improperly classified. 
This further validates our optimal DT classifier, which, 
by using a larger number of attributes and not com- 
bining them, is better able to create a rule set for cor- 
rectly classifying even those objects with nearby com- 
panions. 

We note that it is possible that there exist attributes 
in the SDSS database that we do not employ but which 
could improve classification accuracy when used in a 
DT. It may also be the case that PSF-deconvolved at- 
tributes, such as those measured by 2DPHOT, can im- 
prove performance. Other "indirect" attributes, such as 
colors or combinations of multiple attributes, can also 
be considered, as done by Ball et al. and the SDSS 
parametric method. However, testing all possible sets 
of object attributes is outside the scope of this work. 
Other avenues for further testing include generating 
a Committee Machine, which looks at the outputs of 
multiple classification algorithms and chooses a final 
meta-class for each object based on majority vote or 
some other criterion. Nevertheless, our DT signifi- 
cantly outperforms all other published classifiers ap- 
plied to SDSS photometry. 

6. Summary 

We analyzed the star/galaxy separation perfor- 
mance of 13 different decision tree algorithms from 
the publicly available data mining tool WEKA when 
applied to SDSS photometric data. This is the first 
examination of a public data mining tool applied to 
such a large catalog. We show the completeness and 
contamination functions for all of the algorithms in an 
extensive study of each algorithm's parameter space. 
These functions were obtained using cross-validation 
tests and demonstrate the capability of each algorithm 
to classify objects from training data. Thus, our study 
may be used as a guide for astronomers that desire to 
apply such data mining algorithms in star-galaxy sep- 
aration tasks and other similar data mining problems. 
The main results of our work are: 
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Fig. 7. — The completeness and contamination for the 
SDSS parametric method when our DT classification 
is taken as the truth. 
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Fig. 8. — Postage stamps from SDSS DR7 for nine ob- 
jects which are misclassified by the SDSS parametric 
method. Almost all are blends or have brighter nearby 
companions which affect the photometry. 
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Fig. 9. — The psfMag - modelMag parameter used 
in the standard SDSS classifier is plotted as a func- 
tion of magnitude for the spectroscopic training sam- 
ple. Stars are shown as red dots, while galaxies are in 
green. The dividing lines used by the SDSS classifier 
(psfMag - modelMag = 0.145) and derived by the 
Decision Stump (psfMag - modelMag = 0.376) are 
also shown. The SDSS classifier incorrectly assigns 
galaxy classifications to many relatively bright stars, 
most of which have nearby neighbors. 



1. 13 different algorithms from WEKA are tested 
and Figure 2 shows the locus of the resultant 
completeness functions; 

2. All algorithms achieve the same accuracy in the 
magnitude range 14 < r < 19, but with large dif- 
ferences in the required processing time (Table 
|2]and Figure 2); 

3. The completeness functions in the faint magni- 
tude interval (r > 19) show that the ADTree, 
LMT and FT are the most robust and have simi- 
lar performance (see Table [2j. However, FT re- 
quires approximately half the time to build a DT 
than the others. 

4. The WEKA FT algorithm is therefore chosen as 
the optimal DT for classifying SDSS-DR7 ob- 
jects based on photometric attributes; 

5. We show, using this FT, that reducing the size 
of the training data by a factor of ~ 5 does not 
change significantly the completeness and con- 
tamination functions (see Figure [3}; 

6. We use the FT WEKA algorithm to construct a 
DT trained with photometric attributes and spec- 
tral classifications for 240, 712 objects from the 
SDSS-DR7, and apply this DT to separate stars 
from galaxies in the Legacy survey sample of 
objects in the magnitude range 14 < r < 21 
from SDSS-DR7; 

7. Finally, we compare our results with the SDSS 
parametric method, 2DPHOT and Ball's axis- 
parallel DT. Our catalog has much lower con- 
tamination than all three methods (Figures |4] 
[5] and [6}, and a higher completeness than the 
SDSS parametric method for faint objects (r > 
19). 
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